Linear Systems Analysis 

by 



R. K. Prasanth 



2 



Contents 



1 Models of Dynamic Systems 7 

1.1 Input-output representations 7 

1.2 State space representations 8 

1.3 Some terminology 9 

1.4 Continuous-time and discrete-time 10 

2 Introduction to Vector Spaces 11 

2. 1 Vector spaces and subspaces 11 

2.2 Linear combinations, independence, span and basis 20 

2.3 Coordinates and isomorphisms 26 

2.4 Convex sets and functions 30 

2.5 Some optimization problems 33 

3 Matrix Theory 37 

3.1 Different kinds of matrices 37 

3.2 Numbers and spaces associated with matrices 40 

3.3 Decompositions 46 

3.3.1 QR Factorization 46 

3.3.2 Singular Value Decomposition (SVD) 47 

3.3.3 Spectral Decomposition 48 

3.3.4 Similarity 50 

3 



4 CONTENTS 

3.4 Linear equations and optimization problems 51 

3.5 A collection of important results 54 

4 Elements of Complex Analysis 57 

4. 1 Functions of one complex variable 57 

4.2 Evaluation of a function at an operator 60 

5 Normed Linear Spaces and Banach Spaces 67 

5.1 Norms and normed linear spaces 67 

5.2 Induced metric, balls and sequences 71 

5.3 Banach spaces 74 

5.3.1 Finite dimensional spaces 76 

5.3.2 Spaces of sequences 76 

5.3.3 Lebesgue spaces 77 

5.3.4 Hardy spaces 78 

6 Inner Product Spaces and Hilbert Spaces 81 

6.1 Inner products 81 

6.2 Hilbert spaces 84 

6.3 Orthogonality 85 

6.4 Projection theorem 88 

7 Equilibrium Point and Linearization 93 

7.1 Equilibrium point 93 

7.1.1 Systems without inputs and outputs 93 

7.1.2 Systems with inputs 94 

7.2 Linearization about an equilibrium point 96 

7.2.1 Systems without inputs 97 

7.2.2 Systems with inputs 99 



CONTENTS 5 
7.2.3 Systems with inputs and outputs 100 

8 LTI System Behavior 101 

8.1 Solution of continuous-time LTI equations 101 

8.2 Solution of discrete-time LTI equations 105 

9 Lyapunov (Internal) Stability Notions 107 

9.1 Nonlinear time-invariant systems 107 

9.1.1 Relations between stability notions Ill 

9.2 LTI systems Ill 

9.3 Lyapunov's direct method 112 

10 Lyapunov Stability of LTI Systems 115 

10.1 Lyapunov equation & inequality 115 

10.2 Main stability theorems for continuous-time LTI systems 117 

10.3 Two stability related properties of systems 119 

10.4 Discrete-time LTI systems 120 

10.5 Lyapunov's indirect method 121 

11 Controllability and Stabilizability 123 

11.1 Controllability 123 

11.2 Stabilizability 128 

12 Observability and Detectability 131 

12.1 Observability 132 

12.2 Detectability 135 

12.3 Duality 136 



CONTENTS 



Chapter 1 



Models of Dynamic Systems 



This course is about the analysis of models. A model is a mathematical representation of system behavior. 
It can be obtained by applying the laws of physics (physics-based modeling) or from experimental data 
(black-box modeling) or by combining both methods (grey-box modeling). Analysis deals with checking if 
a system model has a certain property. For example, is the system stable ?, or how much is the overshoot 
in response to a unit step input ?. We will define important system properties and provide computational 
procedures for analysis. 



u 




y = T (u) x = Tj (u, x Q ) 

y = T 2 (x, u) 

Figure 1.1: Input-output (left) and state space (right) representations 



1.1 Input-output representations 

One of the most powerful ways of representing a system is in terms of inputs and outputs. The simplest 
example of such a representation is the linear algebraic equation: 

y = Tu 

where T is a matrix and u is the input and y is the output. The equation specifies how the system (in this 
case the matrix T) acts on the input to produce the output. A more general case of algebraic input-output 
representation looks like: 

Ay = Bu 



7 



8 



CHAPTER 1 . MODELS OF DYNAMIC S YSTEMS 



where A and B are matrices. Extending further, when differential operators are involved, we get a dynamical 
system. For example, 

y + a(t)y + b(t)y = c(t)u (1.1) 

is a dynamical system represented in input-output form. If we denote differentiation with respect to time by 
V, then the above equation can be written as: 

(V 2 + a(t)V + &(<)) y = c(t)u 

which has the same form as the linear algebraic equation. The coefficients a, b and c are referred to as 
system parameters. 

An input-output representation relates signals that go into (inputs u) the box to the signals that come out 
(outputs y). The block diagram on the left hand side of Figure 1.1 shows this viewpoint. The mathematical 
formula contains only u, y and their derivatives. There is no explicit information about what is going on 
inside the box. This is the approach pioneered by engineers and scientists in the United States during the 
first half of 1900s. 



1.2 State space representations 

State space representations, on the other hand, gives us information about what goes on inside the box. They 
involve inputs, outputs and a set of variables called internal states (or simply states). Their general form is: 

x = f(x,u), x(0) = xo (l-2a) 
y = h(x 7 u) (1.2b) 

where a; is a vector of states, u is the input vector, y is the output vector and xq is the initial state. Inputs and 
outputs are physical quantities that are measured; so they do not depend on our particular point of view. On 
the other hand, the state of the box can be expressed using different sets of internal variables (or coordinates 
or states). For example, in a spring-mass-damper system, states could be the position x and the velocity x 
or they could be x and x + 2bx. Russian scientists and engineers pioneered this approach to systems. 

The differential equations (1.2) in state space models are first order vector differential equations unlike 
the second order differential equation appearing in the input-output model (1.1). An ordinary differential 
equation of order n can always be converted into n first order differential equations by choosing states 
appropriately. For example, in (1.1), choose 

x\ = y and X2 = y 



as the states and form the state vector: 

x = 



Xl 
X-2 



1.3. SOME TERMINOLOGY 



9 



Then, 



Xl 
X2 



y 










X2 


—ay — by + cu 






ax2 — bxi + cu 


0 1 




Xl 




"o" 


u = Fx + Gu 






+ 




—b —a 




X2 




c 





which gives us the differential equations in the state space representations. Now, for the output equation 
(1.2), we write: 

y = [l 0]x 

Therefore, only first order vector differential equations need to be considered. 

We shall spend a great deal of time working with state space models, especially linear time-invariant (LTI) 
systems of the form: 



x 



y 



Ax + Bu 
Cx + Du 



where A, B, C and D are matrices of appropriate dimensions. We shall often refer to the differential 
equation as the state equation and the algebraic equation for y as the output equation. There are three main 
motivations for studying this simple case: (i) all system properties in this simple case can be determined from 
linear algebraic properties of the matrices A, B, C and D that define the state space model, (ii) input-output 
and state space representations of a physical process are essentially equivalent which is very gratifying since 
physical properties should not depend on the modeler's choice of mathematics, and (iii) most real-world 
systems are nonlinear, but they behave like these linear systems in the vicinity of equilibrium points. 



1.3 Some terminology 

A system is nonlinear if at least one of the terms is a nonlinear function of the state variable. For example, 
the system: 



Xl 




Xi + tX2 


x 2 _ 




Xl 



y = x 2 

is nonlinear even though the equations for x 2 and y are linear. If at least one of the terms depends on time 
explicitly, then it is a time-varying equation. The above example is a time-varying nonlinear system. 

If all the terms in a state space model are linear functions of the state, then the system is said to be linear. 
The following example is not a linear system: 



Xl 




Xl + 2X2 


X 2 




Xl — 2X2 



10 



CHAPTER 1 . MODELS OF DYNAMIC S YSTEMS 



y = sin(xi) 

because the output equation is nonlinear even though the differential equation is linear. If the terms are 
not explicit functions of time, then the equation is time-invariant. The previous example is a nonlinear 
time-invariant system. 

We will call the systems considered so far homogeneous as they do not contain any external inputs. External 
inputs can be control inputs (quantities that we have control over) or exogenous inputs (disturbances, com- 
mand inputs etc). Systems with such inputs are forced systems. An example of a nonlinear time- varying 
forced system is the following: 

x = 5e t /x + u 
where x is the state and u is the external input. 

1.4 Continuous-time and discrete-time 

The independent variable time flows continuously in continuous-time systems. All systems mentioned in 
the previous sections and, in fact, models of most physical processes found in literature are continuous-time. 
Discrete-time systems are those in which time jumps discretely. Their general state space form is: 

x(k + l) = f{x{k),u{k)) 
y(k) = h(x(k),u(k)) 

where k denotes discrete-time and takes the values 0, 1, 2, • • •. The first equation above is called a difference 
equation and it specifies the states at the next time instant k + 1. Note that the differential equation in 
the continuous time case specifies the rate of change of states. The general form of discrete-time linear 
time-invariant systems is: 

x k+1 = Ax k + Bu k 
Vk = Cx k + Du k 

where instead of the notation x(k) we use x k which is easier to write. It turns out that properties of discrete- 
time LTI systems can also be expressed in terms of the state space matrices (^4, B, C, D). 



Chapter 2 



Introduction to Vector Spaces 



Vector spaces give the algebraic structure needed to formulate and solve engineering problems. They form 
the foundation for linear algebra and optimization. This chapter gives a brief description of vector space 
theory. 

2.1 Vector spaces and subspaces 

A field F is a set together with the operations of addition and multiplication. The set must contain a unique 
additive inverse and a unique multiplicative inverse. The operations must be associative, commutative and 
distributive. Precise definition can be found in any good textbook on vector spaces or linear algebra [2, 3, 4]. 
Some examples of fields are the following. 

Example 2.1.1 (Real and complex number fields) The set of real numbers denoted by IR along with the 
usual addition and multiplication is afield. The additive inverse is 0 and the multiplicative inverse is 1. The 
set of complex numbers C with the usual addition and multiplication is also afield. A 

Example 2.1.2 (Rational number field) The set of rational numbers along with the usual addition and 
multiplication is afield. A 

The sets mentioned in the above examples have some nice features. They are closed under addition and 
multiplication. That is, in IR, addition (multiplication) of two real numbers always gives a real number. 
This algebraic feature 1 will be seen in later definitions. A second feature is that addition and multiplication 
are commutative, associative and distributive. In other words, the order in which addition and multiplication 
between elements are performed does not matter. This highly gratifying feature will also be seen later. A 
third feature is that, except for 0, all elements of these sets have a multiplicative inverse. For example, in 

1 Roughly speaking, like things produce like things under algebraic operations 



11 



12 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



C, if x 7^ 0 is a complex number, then there is a complex number y such that xy = 1. We normally write 
this inverse as l/x. Thus, the set of integers is not a field because the integer 2 has no inverse in the set of 
integers. 

The elements of a field are called scalars. A field is the underlying object over which vector spaces are 
defined. The underlying field in most engineering problems is either IR or C(some engineering problems do 
have exotic fields). 

Definition 2.1.1 (Vector space) A vector (or linear) space over afield F is a triplet (V, +, •), where 
V is a set of objects called vectors, 

+ denotes vector space addition between elements ofV, and 

• denotes scalar multiplication of the elements ofV by elements in the field F, 

with the following properties: 

1. V is closed with respect to addition and scalar multiplication. That is, for any x and y in V and 
scalar a 6 F, the vector sum x + y and the product a • x are in V. 

2. Addition and scalar multiplication are associative, commutative and distributive. That is, for any 
x, y, z in V and a, j3 in F, we have 



(x + y) + z = x + (y + z) 



[addition is associative) 



x + y = y + x 
(aj3) • x — a • (P • x) 
a • (x + y) — a» x + ct*y 
(a + P)»x = a»x + P»x 



(addition is commutative) 



(scalar multiplication is associative) 
(addition distributes over scalar multiplication) 
(scalar multiplication distributes) 



3. 



There exists a zero element, denoted by 0, in V such that 



x + 0 = x 



for any x in V. 



4. 



For each x E V, there exists an additive inverse, denoted by —x such that x + (— x) = 0. 



5. For each x £ V, we have 1 • x = x. 



The elements of V are called vectors, but they are not necessarily vectors in the colloquial sense. As we 
shall see in the examples below, they can be matrices, functions or other objects. The notation for scalar 



2. 1 . VECTOR SPACES AND S UBSPACES 



13 



multiplication • is tedious and is almost never used. For example, we normally write 2x instead of 2 • x to 
mean the vector obtained by multiplying the vector x with the scalar 2. 

Our motivation comes primarily from engineering applications where addition and scalar multiplication are 
defined in certain standard ways. For example, matrix addition is understood as addition of elements of the 
matrices, and function addition is performed point-wise. Such familiar notions will be called standard or 
usual addition and scalar multiplication. Whenever standard addition and scalar multiplication are used, 
we simply say that V is a vector space instead of the awkward notation (V, +, •). As mentioned, the most 
common fields used in engineering are IR and C. A vector space is called a real vector space ( respectively 
complex vector space) when the underlying field is IR (respectively &). 



Example 2.1.3 (Space of column vectors F n ) Consider the set F n of column vectors of size n whose com- 
ponents are elements of the field F: 





f 










X2 




F n = < 


x = 




: xi,X2, ■ ■ x n are in F 






- Xn - 





Given a column vector x, the ith component of x is denoted by Xi. Let us define addition and scalar multi- 
plication in the usual sense. That is, for vectors x, y 6 F n , addition of x and y is defined componentwise: 

x 1 + y 1 



x + y 



X2 + 2/2 



- Xn + yn ■ 

and, scalar multiplication by a € F is also defined componentwise: 

' ax\ 

OLX2 

ax = 

- CZXyi 

It is easy to verify that F n with this notion of addition and scalar multiplication is a vector space over F. 
The key observation in the verification is that vector addition and scalar multiplication as defined above are 
induced from field addition and field multiplication (defined over F). The properties of the latter induce the 
properties required ofF n to become a vector space. A 



Example 2.1.4 (The spaces IR, IR n , C, € n ) Concrete examples ofF n can be found by taking the field F to 
be IR or C. Thus, IR and IR 2 are examples of vector spaces over lR(or real vector spaces); while C and C 2 
are vector spaces over Qor complex vector spaces). A 



14 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



Example 2.1.5 (Spaces of matrices F mxn ,IR mXn ,C mXn ) Consider the set F mxn of m x n matrices 
whose components are elements of the field F: 



F 



mxn 



a n ai2 

021 0,22 



din 
d2n 



ciij EF Mi — 1, 2, ■ ■ ■ , m, j — 1, 2, ■ ■ ■ , n 



Define addition and scalar multiplication in the usual component-wise sense: for matrices x,y E F mXn 
and scalar a E F, 

x + y = [xij + , ax = [axij]™^ 

where x^ denotes the (i,j)th entry of x. Again, using the observation that these definitions are induced 
from field addition and field multiplication, we can show that F mxn is a vector space over F. 

As before, concrete examples are IR mxn over IR and C mxn over C, respectively the real vector space of 
mxn matrices with real entries and the complex vector space ofmxn matrices with complex entries. A 



The above examples illustrate an important process of constructing new vector spaces by taking products 
of old ones. For example, the space IR 3 of real vectors of size 3 can be thought of as the Cartesian product 
of the spaces R 1 and IR 2 , or as the Cartesian product of IR 1 with itself three times. We state the process 
below and it is one of the ways of constructing complicated vector spaces from simpler ones. It enables us 
to "combine" and study vastly different objects and will be useful in advanced engineering problems. 

Proposition 2.1.1 (Products of vector spaces) Suppose that (V,+y,»y) and (W, +w, m w) are vector 
spaces over the field F.The Cartesian product 

V X W = {(v, w) :v EV and w EW] 

with the following notion of addition + and scalar multiplication •: 

(vi,wi) + (v2,W2) = (vi +v V2, wi +w w 2 ) for all (vi,wi) and (u2,w 2 ) inV xW 

a • (v, w) = (a «y v, a •w w) for all a EF and (v, w) inVxW 
is a vector space over F. ■ 



Some remarks about the notation used in the proposition are in order. We represented the elements of the 
Cartesian product V x W as (v,w) where the first component v belongs to V and the second component w 
belongs to W. This is the standard representation. It has the advantage that the components can be different 
objects - vectors, matrices, functions, etc. Thus, we may consider the Cartesian product of IR and C 2x2 : 



KxC 2x2 = {(x,j/) :a;elRandyeC 2x2 } 



2. 1 . VECTOR SPACES AND S UBSPACES 



15 



whose typical element looks like: 




where a; is a real number and a, b, c, d are complex numbers. When V and W are collections of objects of 
the same kind, it may be useful to represent elements of the Cartesian product as: 

v 
w 

For example, when V = IR and W = R 2 , we may write: 

JRxJR 2 = < 

which suggests that the Cartesian product of the space of vectors of size 1 (IR) with the space of vectors of 
size 2 (IR 2 ) gives the space of vectors of size 3. 

Another observation is regarding the notation used for addition and multiplication. V and W may have 
different notions of addition and scalar multiplication. We use subscripts to indicate the notional difference. 
Thus, +v and m v denote addition and scalar multiplication in the vector space V. The definition of addition 
+ on the Cartesian product V x W is induced by the notions of additions in V and W. That is, the first 
components are added according to the addition +y in V and the second components are added according 
to +w in V. So, in general, the addition + in V x W is an abstract composite notion. Similar comments 
hold for scalar multiplication. 

Finally, the proposition deals with two vector spaces V and W. But, by repeatedly applying the proposition, 
we can get more complex spaces U x V x W x • • •. Indeed, the spaces F n and F mxn defined earlier can 
be seen as applications of the above fact. For example, F n is the product space of F taken n times. 

Example 2.1.6 (Space of polynomials of degree at most n, V n (T)) Let T be a subset o/IR. A real poly- 
nomial p with domain T is a function of the form: 

m 

p{t) = Y,Pkt k 

k=0 

defined for all t E T. p^ is called the kth polynomial coefficient. The coefficients of a polynomial are 
elements of some vector space. For example, a scalar polynomial is a polynomial whose coefficients are 
scalars, while a matrix-valued polynomial has coefficients in F mXn . The zero polynomial is the polynomial 
whose coefficients are all equal to zero. If at least one of the coefficients is non-zero, then the polynomial 
is refered to as a non-zero polynomial. We say that a non-zero polynomial p has degree n if t n is the 
highest power with a non-zero coefficient ( since the coefficients lie in a vector space, zero and non-zero are 
well-defined). The zero polynomial has degree zero. 



v 

Wl 
W2. 



: v E IR and 



W 2 



E IR 



16 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



Let n > 0 be an integer. Let V n (T) be the set of all polynomials with domain TcE of degree at most n 
whose coefficients are real numbers, i.e. 

<v J P : T ^ SUck that PW = SLo a k t k for allteT \ 

^n\J-)—\ • to I - > 

y and for some ao, ai,- ■ -,a n in IK J 

Note that V n (T) contains polynomials of degree 0, 1, • • -,n as the coefficients are free variables and may 
take on the value zero. 

Let us define addition and scalar multiplication of polynomials as follows: 

(p + <?)(*) = (p(t) + q(t)) VieT 

(ap)(t) = (ap(t)) VtGT 

for all p, q in V n (T) and a 6 JR (say in words the expressions on both sides of equality to fully understand 
the definitions). 

With these definition, V n (T) is a vector space over JR. A 



V n (T) is a fairly important vector space. It appears in many applications including curve fitting (i.e, find a 
polynomial degree at most n that fit a given data set). It is also our first example of a space of functions. 



Example 2.1.7 (Space of polynomials V (T)) As before, let T be a subset ofJR and V n (T) be the real 
vector space of polynomials defined on T of degree at most n whose coefficients are real numbers. Define 

oo 

V (T) ^ V 0 (T) |J Vi (T) (J V 2 (T) ■ ■■ = |J V n (T) (2.2) 

n=0 

which is the set of all real polynomials. Define addition and scalar multiplication on V (T) in the standard 
way, i.e, point-wise as in Example 2.1.6. Then, V (T) is a real vector space. A 



The difference between V n (T) and V (T) lies in the bound on the polynomial degree. All the polynomials 
in V n (T) are of degree at most n. That is, the positive integer n is a tight upper bound on the degree of the 
polynomials in V n (T) . On the other hand, there is no bound on the degree of polynomials in V (T) . That 
is, given any positive integer m, there is a polynomial of degree strictly greater than m in V (T). Later on, 
we will introduce the concept of dimension of a vector space. It will become clear then that "P n (T) is a 
finite dimensional function space while V (T) is infinite dimensional. 



Example 2.1.8 (Space of continuous functions C ([a, &])) The set of real continuous functions defined on 
the interval (a, b) along with pointwise addition and scalar multiplication by real numbers is a real vector 
space. It is denoted by C ([a, 6]). A 



2.1. VECTOR SPACES AND SUBSPACES 17 



Example 2.1.9 (Space of solutions of linear differential equations) Consider the differential equation 

x = Ax, x(0) = xq 
where A is in H nxn and xq € IR n is the initial condition. Let 

c/) X0 : [0, oo) -> IT 

denote the solution starting at xq. Define: 

V= {4> X0 :x 0 €M n } 

as the set of all solutions. Then, V is a real vector space with the standard addition and scalar multiplication 
of functions. A 



Let us consider the differential equation: 

x = —x, x(0) = Xo 

whose solution: 

x(t) = e^x 0 for all t > 0 

is a continuous function (for a fixed initial condition xq). So, the set of all solutions V consists of continuous 
functions on the time interval [0, oo). Therefore, V is a subset of C ([0, oo)) . Note that V and C ([0, oo)) are 
both vector spaces. They also have the same notions of addition and scalar multiplication. This observation 
leads to the definition of a subspace which is as important as a vector space. 



Definition 2.1.2 (Subspace) A subset U of a vector space V is a subspace ofV if and only ifU is itself a 
vector space with the addition and scalar multiplication defined on V. i< 

Example 2.1.10 (Trivial subspaces) The sets {0} and V are the smallest and the largest (in a sense that 
shall become precise when we introduce dimension) subspaces ofV. A 

Example 2.1.11 (Polynomials of degree at most n) Let V (T) be the real vector space of polynomials 
given in Example 2.1.7. Let n > 0 be an integer. Consider the real vector space V n (T) of polynomi- 
als of degree at most n defined in Example 2.1.6. Clearly, V n (T) is a subset ofV (T) as the latter contains 
polynomials of degree any degree. V n (T) and V (T) have the same standard notions of addition and scalar 
multiplication, namely point-wise addition and scalar multiplication of functions. So, V n (T) is a subspace 
ofV(T). A 

To check if a subset U is a subspace, we may check if U is a vector space. This can be tedious as a number 
of properties need to be verified. The following result gives a simple test. 



18 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



Proposition 2.1.2 (Subspace test) Let U be a subset of a vector space V over the field F. The following 
statements are equivalent. 

1. U is a subspace ofV 

2. For any xEU,yEU,aEF and j3 E F, we have ax + j3y E U. ■ 
When a = 0 and j3 = 0, the quantity 

ax + f3y 

evaluates to 0. So, a subspace must contain 0. 

Example 2.1.12 (Continuous functions with a common zero) Let C (IR) be the real vector space of con- 
tinuous functions on IR with the standard function addition and scalar multiplication. Define: 

U^{fEC(JR):f(0)=0} 

as the subset of all continuous functions that evaluate to zero at 0. 

We claim that U is a subspace ofC (IR). To prove this, note that U is a subset ofC (IR) and that the function 
0 is in U (0 is the function that is zero at every point in the domain). Now, pick any f and g in U and, a and 
P in IR Define the function: 

h = af + f3g 

Since f and g are in C (IR) and C (IR) is a vector space, h is in C (IR). Moreover, 

h(0) = a/(0) + Pg{Q) = 0 

as /(0) = 0 and g(0) = 0 based on the fact that f and g were chosen from U. So, h is in U. As these 
arguments apply to any f and g in U and, a and (3 in IR, we conclude using statement 2 of Proposition 2.1.2 
that U is a subspace. A 

Example 2.1.13 (Affine set) Let U be a subspace of a vector space V and b be an element ofV. Define 

A b = {y = x + b:x E U} 

as the set of elements ofV obtained by translating elements ofU by a fixed amount b. The subset A b is 
called an affine subset ofV. When b E U, A b is a subspace ofV. Otherwise, it is not. A 

There are two ways of combining subspaces to produce new subspaces. They are discussed below. Recall 
that a subspace is a vector space by definition. So, the procedures to construct new subspaces given below 
can also be seen as ways of constructing new vector spaces from old ones. 



2. 1 . VECTOR SPACES AND S UBSPACES 



19 



Proposition 2.1.3 (Intersection of subspaces) Let U and V be subspaces of a vector space W. Then, their 
intersection: 

■ A 



Uf)V= {x : x EUandx E V} 



is a subspace of U, V and W. 



The proof involves the application of Proposition 2.1.2. The expression on the right hand side of the equation 
in the above proposition involves no condition that is specific to vector spaces. In fact, it is the definition 
of intersection of two sets. Unlike intersection, the set-union of subspaces does not necessarily produce a 
subspace. We need a different notion of "union" of subspaces. 

Definition 2.1.3 (Sum of subspaces) Let U and V be subspaces of a vector space W. The sum ofU and 

V is given by: 

U + V = {z E W : there exist x E U and y EV such that z = x + y} 

IfUf]V = {0} i.e., if zero is the only element in the intersection of the subspaces U and V, then the sum 
of U and V is called the direct sum of U and V. The direct sum is denoted byU®V. ~k 

Proposition 2.1.4 (Sum and Direct Sum are subspaces) Let U and V be subspaces of a vector space W. 

Then, their sum U + V and their direct sum U © V are both subspaces of W. ■ 



Example 2.1.14 Let W be the vector space JR 3 , the space of column vectors of size 3 with real entries. Let 



and 







~Xl~ 








X2 


E JR 3 : xi E JR, x 2 = 0, x 3 = 0 > 






.X3. 






• 


~Xl~ 


> 






X2 


E JR 3 : xi = 0, x 2 E H, x 3 = 0 






.X3_ 





X\ 

0 
0 



E JR 3 : xi E JR 





• 


" 0 " 


> 


H 




X2 


E JR 3 : x 2 E JR > 






_ 0 . 





It is easy to verify that U and V are subspaces ofW = JR 3 . Their direct sum is 

U © V = {z E W : there exist unique elements x E U and y EV such that z = x + y} 



f 


~X\~ 




" 0 " 




< z E W : there exist x\ E H, y2 EJR such that z = 


0 


+ 


V2 






. o . 




_ 0 . 


i 



= < 

y 2 : x\ E K,y2 E JR 
0 

It is not difficult to see that U ®V is a subspace. 



A 



20 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



An interesting property of direct sum of subspaces is given in the next proposition. 

Proposition 2.1.5 (Property of Direct Sum) Let U, V and W be subspaces of a vector space X. W = 
U © V if and only if 

W = {z 6 X : there exist unique elements x 6 U and y 6 V such that z = x + y} 

■ 

That is, every vector in the direct sum of U and V can be written as the sum of a unique element in U and a 
unique element in V. 



2.2 Linear combinations, independence, span and basis 

This section introduces some important concepts that are used in many areas including the study of linear 
equations and optimization. 

Definition 2.2.1 (Linear combination) Let {xjt}^ =1 be a set of vectors in a vector space V (over F). A 
linear combination of {xk}™ =1 is an element xofV that can be expressed as: 

m 

x = X! a k x k (2-3) 

k=l 

for some scalars ai, 0:2, ■ • •, a m in F. -k 

A linear combination can be thought of as a weighted sum of vectors. Note that any vector of the weighted 
form (2.3) is in V because V is a vector space. So, as the weights {a^} are varied, possibly different 
elements of V are generated. The set of all elements generated by this process is very special and is defined 
below. 

Definition 2.2.2 (Span of {xk}™ =1 ) The span of a set of vectors {xk}™ =1 in a vector space V is: 

Span {{xk}™ = i) = {x E V : x is a linear combination of {xk}™-^ (2.4a) 

f m 1 

= <xEV:x = '^2 a kXkfor some a±, 02, • • ■, a m in F > (2.4b) 

that is, the set of all vectors in V that are linear combinations of{xk}™ =1 . -k 

It is instructive to write the weighted sum representation of a linear combination in the following way when 
V is IR n (or C n ). Let xi,x 2 , ■ ■ ; x m be vectors in K n . Define 



X = [xi x 2 ■ ■■ x m ] 



2.2. LINEAR COMBINATIONS, INDEPENDENCE, SPAN AND BASIS 



21 



as the n x m matrix obtained by placing the vectors side by side. So, the A;th column of X is the vector x k . 
Then, a linear combination x of {xjt}^ =1 can be written as: 



x 



k=l 



k=l 



Oil 
«2 



Otr. 



Xa 



which shows that x is obtained by multiplying the matrix X by a vector a. From this view point, the span 
of {x^y^-i is the set of all vectors that can be generated by multiplying X by vectors. We will use this 
observation later on to study matrices and systems of linear equations. 



Definition 2.2.3 (Linear independence) Let {xk}™^ be a set of elements of a vector space V. The set 
{xk}™ = i is linearly independent if and only if 

m 

^2 a k x k = 0 

k=l 

for some scalars 01,02, ■ ■ -,o m implies that oi = 02 = • • • = o m = 0. Otherwise, the set is 
linearly dependent. i< 



Another way of stating linear independence is that the only set of scalars {a k } that satisfy the equation: 

m 

Y a k x k = 0 

k=l 

is 01 = 02 = • • • = o m = 0. When the set is linearly dependent, the equation has a solution with at least 
one non-zero a k . Recall the definition of a linear combination of {x k } k n =1 and the discussion following it 
using the matrix X whose columns are x k 's. Linear independence means that the linear system of equations 

Xa = 0 

has one and only one solution, namely o = 0. Equivalently, linear dependence means that this system of 
equations has non-zero solutions. 

Example 2.2.1 Consider the vectors 



"l" 




1 " 




and X2 = 




1 


-1 



in IR 2 . The span of x\ and X2 is: 



Span (2:1,2:2) = 





r 




' 1 




{• 




+ p 




: a,0 E 1R.J 




1 


-1 



22 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



a + /3 



: a, 0 e IR 
: 7,5 G It I 



= K 



These computations can be explained very clearly using Figure 2.1 where x\ and X2 are shown by the 
directed arrows. The vectors x\ and are perpendicular to each other. So, every vector in the plane can 
be expressed in terms of x\ and X2- Hence, the span is R 2 . 




Horizontal Axis 



Figure 2.1: Representation of x\ and X2 of Example 2.2.1 

Do the two vectors form a linearly independent set ? To answer this question, we appeal to the definition of 
linear independence. Accordingly, we must examine the condition: 

aix\ + a 2 X2 = 0 

and must conclude that ol\ = 0 and a.2 = 0. Let us expand out the left hand side of the above equation: 



V 


+ «2 


" 1 




Otl + CK2 










1 




-1 




^ «1 — «2 



which shows that the condition to be examined is: 



= 0 



CKl — 02 

The above equation gives: ct\ + a.2 = 0 and a\ — a.2 = 0 which holds if and only ifai = 0 and 02 = 0. 
Therefore, we conclude that {xi, X2} is a linearly independent set. A 



Example 2.2.2 The matrices 



"1 


0" 




"0 


0" 




"0 


r 


0 


0 


, X 2 = 


0 




, x 3 = 




0 


1 


1 



2.2. LINEAR COMBINATIONS, INDEPENDENCE, SPAN AND BASIS 



23 



form a linearly independent set in ]R 2x2 . To see this, suppose that 

a\Xi + 0*2X2 + 0.3X3 = 0 
for some scalars ati, 02, 03. Then, we have: 





0" 


+ 


0 


0 " 


+ 


" 0 


«3 




Oil 


a 3 


0 


0 




0 


«2 




O3 


0 




a 3 


«2 



= 0 



which gives a\ = 0, 02 = 0 a«<i CK3 = 0 as required. What is the span of these three matrices ? 



A 



Proposition 2.2.1 (Span is a subspace) Let {xk}™ = i be a set of elements of a vector space V. Then, their 
span is a suhspace ofV . ■ 

In view of this proposition, span is often referred to as the subspace spanned or generated by the set {xjt}^!.]^. 
Span has a very nice property. Suppose a new vector x m+ \ is added to the collection {xk}™ =1 . Then, the 
span of the new collection {xk}™=i is at least as large as the span of {xk}™ = i, if not bigger. This is stated 
precisely in the following result. 

Proposition 2.2.2 (Nested sequence) Let xi,X2, • • •, x m+ i be elements of a vector space V. The following 
statements are true. 

1. Span {{xk}™ = i) is contained in Span ({a^}/—] 1 ) 

2. Suppose that is linearly independent. Then, 

Span(K}^ =1 ) = Span {{x k }^) 
if and only if {xk}™=i is linearly dependent. ■ 



Statement 1 says that 

Span(xi) C Span (211,212) C • • -Span ({x k }^ =1 ) C Span {[x k }™=i) C V 

which shows the increasing (nested) property of span. Statement 2 gives a condition under which the 
inclusion becomes an equality. 

The definitions of linear combination, span and independence involve & finite number of vectors {xk}™ = i- 
Their extensions to any collection of vectors is needed in the sequel. We adopt the following extensions to 
avoid potential technical problems. 



Definition 2.2.4 (Case of possibly infinite collection) Let U be a subset of a vector space V. 



24 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



1. A linear combination of elements ofU is an element x in V which can be expressed as 

m 

X a kXk 
k=l 

for some finite collection X\,X2, ■ ■ •, x m in U and scalars cti, «2, ■ ■ m , ce m . 

2. The elements of U are linearly independent (or, U is a linearly independent set) if for any finite 
collection {xk}™ =1 in U, 

m 

a k x k = 0 

k=l 

implies a\ — a.2 — ■ ■ ■ — a m = 0. 

3. The span of elements of U is the set of all linear combinations of elements of U. *k 

These definitions clearly reduce to the earlier definitions when U is finite. The important point is that only 
a finite number of elements appear in the weighted sum even when U is an infinite set. Now, let us return 
to the properties of span. Proposition 2.2.1 says that the span is a subspace. This prompts us to ask if a 
subspace is the span of a set of vectors. 

Proposition 2.2.3 (Subspace as span) Let U be a subspace of a vector space V. Then, U is the span of 
elements of U. ■ 

The proposition simply says that every element of a subspace can be expressed as a linear combination of 
elements of the subspace. This is not very useful in an operational sense because a subspace in general has 
an uncountable number of elements. In engineering applications, we will need to manipulate and represent 
subspaces. It would be nice if a subspace can be written as the span of a countable or, better yet, finite 
number of elements. These elements would then capture all the information contained in the subspace in a 
compressed form. In fact, the problems of data and image compression are precisely the problem of finding 
such elements. We introduce some relevant concepts and results. 

Definition 2.2.5 (Dimension of spaces) Let U be a subspace of a vector space V. 

1. U is said to be finite dimensional if and only ifU is the span of a finite number of elements in U. 
Otherwise, U is said to be infinite dimensional. 

2. When U is finite dimensional, the minimum number of elements needed to span U is called the dimen- 
sion of U. 

3. When U is infinite dimensional, the dimension ofU is oc. ■ 



Definition 2.2.6 (Basis) Let U be a subspace of a vector space V. A basis for U is a linearly independent 
set of elements in U whose span is U. i< 



2.2. LINEAR COMBINATIONS, INDEPENDENCE, SPAN AND BASIS 



25 



Theorem 2.2.1 (Basis and dimension) Let U be a subspace of a vector space V. The following statements 
are true. 

1. Let B be a basis for U. The dimension ofU is equal to the number of elements in B. 

2. Suppose that U is finite dimensional. A subset BofU is a basis for U if and only ifU is the span of 
B and the number of elements in B is equal to the dimension ofU. ■ 

The dimension of U is the minimum number of elements needed to span U. So, statement 1 of the theorem 
implies that a basis for U has the minimum number of elements among all subsets that span U. Put dif- 
ferently, if a collection has more elements than the dimension of its span, then the collection is not a basis. 
Statement 2 of the theorem is a special case of statement 1 . A consequence is that every basis for a space 
has the same number of elements (more precisely, there is a one-to-one and onto mapping between different 
bases for a space). 

Example 2.2.3 (Standard basis for IR n , C n ) Let e& be the n x 1 vector whose entries are all zeros except 
for the kth element which is 1: 



"1" 




"(T 




"0 


0 




1 




0 




, e 2 = 








0 




0 




0 


.0. 




.0. 




.1 



The set of vectors {ei , e2, • • •, e n } is a basis for JR n . This means that ( i) the collection is linearly independent, 
and (ii) every vector in JR n can be expressed as a linear combination of the collection. We call {etc}k=i the 
standard basis for JR n . Since the basis has n elements, the dimension o/IR n is n. 

The collection is also the standard basis for C n . A 



Example 2.2.4 (Another basis for IR 2 ) The vectors x\ and X2 given in Example 2.2.1 form a basis for R 2 . 

Example 2.2.5 (Standard basis for IR mXn , C mXn ) Let Mij be the m x n matrix whose entries are all 
zeros except for the (i, j) element which is 1. The collection {Mjj}^^ is the standard basis for IR mXn and 
C raxn . A 



Example 2.2.6 (Standard basis for V n (T)) The polynomials l,t,t 2 , • • -,t n is a basis for the space V n (T) 
of polynomials of degree at most n defined in Example 2.1.6. This is the standard basis for V n (T). The 
dimension ofV n (T) is n + 1. A 



26 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



Example 2.2.7 (Bernstein basis for V n ([0, 1])) The Bernstein polynomials of degree n are defined by: 

for i — 0, 1, ■ ■ ■ , n. These polynomials form a basis for V n ([0, 1]). A 

Recall the nesting property of span given in Proposition 2.2.2. We use it give the following abstract Gram- 
Schmidt procedure to compute a basis. 

Algorithm 2.2.1 (Abstract Gram-Schmidt) Let U be a subspace of a vector space V. 

Step 1: Set iteration counter k — 1. Choose a non-zero vector x\ 6 U and set Bk = {x\}. 
Step 2: If span ofBk is U, stop. Otherwise, choose a vector Xk+i in U but not in the span ofBk- 
Step 3: Set 

B k+ i = B k {J{x k+1 } 

and increment iteration counter by 1. Go to Step 2. ■ 

This algorithm terminates if U is finite dimensional. It produces a sequence of sets {B k } whose spans form 
a strictly increasing sequence. 

2.3 Coordinates and isomorphisms 

Elements of vector spaces should be thought of as abstract objects. One of the great consequences of the 
concept of basis is that it allows us to obtain a concrete representation of a vector as an array of numbers 
(scalars). We have been using this representation thus far in examples. It turns out that, in advanced engi- 
neering problems, the abstract version is essential for problem formulation and the concrete representation 
is useful in computations. The array representation gives the coordinates of the vector in a chosen basis. 
This concept is made precise below along with the concept of coordinate change. 

Proposition 2.3.1 (Representation in a basis is unique) Let {x k } be a basis for a vector space V. Then, 
for each x 6 V, there exist a unique set of scalars {a k } such that x = J2 k a k,Xk- ■ 

Suppose that {a k } and {/3 k } are two sets of scalars. Define: 

x = X! a k x k ^ V = X! P^Xk 

k k 

Then, according to the proposition, x = y if and only if = fik for each k. This means that all the infor- 
mation needed to distinguish the vector x from any other vector is contained in the scalars {ajt} (similarly 



2.3. COORDINATES AND ISOMORPHISMS 



27 



for y). So, for all practical purposes, the abstraction x can be replaced by the concrete representation {ajt}. 
It is customary to stack these scalars one below the other and form a column vector. The vector so obtained 
is the coordinate of x in the coordinate system defined by {xk}- 

Definition 2.3.1 (Coordinates and coordinate system) Let {xk} be a basis for a vector space V and x be 
an element ofV. The unique set of scalars such that 

x = ^2 a kXk 
k 

is called the coordinate of x in the basis {xk}- The basis is called a coordinate system. -k 
Example 2.3.1 The standard basis for IR 2 is: 




and in this basis, let x be represented as: 





r 




"o" 


X = 1 




+ 2 






0 


1 



Then, the coordinates of x in the standard basis is {1,2} which is usually written as either (1,2) or as 
[1, 2] T in vector form. 

Another basis for IR 2 is: 




In this basis, the same abstract quantity has the representation: 





r 




"o" 


X = 1 




+ i 






i 


1 



so that the coordinates of x is {1, 1} or in the customary form [1, 1] T . A 

Thus, coordinates always require a coordinate system to be defined. We usually take the coordinate system 
to be the one defined by the standard basis and do not explicitly mention it. A change of coordinates is often 
required to better understand and solve problems more efficiently. In the above example, we considered 
two bases B s and B 0 for IR 2 . In these bases, the coordinates of the vector x turned out to be [1, 2] T and 
[1, 1] T respectively. That is, when the basis was changed from B s to B 0 , the coordinates underwent a 
transformation. This transformation is very important and is defined below. 

Definition 2.3.2 (Coordinate transformation) Let B\ and B2 be two bases for a vector space V. For 
i = 1,2, let xi denote the coordinate of x E V in the basis Bj. Then, the map 

X\ H-> X2 



is called the coordinate transformation from B\ into #2. 



28 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



The definition is shown graphically in Figure 2.2. The map Idi is an identification map that associates each 
element of V with an unique element of the span of the elements of the basis B^. It is an invertible map. 
We say that the diagram in Figure 2.2 commutes because T o Id\ = Id 2 where T denotes the coordinate 
transformation. It gives a way to explicitly compute the coordinate transformation for finite dimensional 
vector spaces. The next result shows how and summarizes some special cases. 




Span ( <B 1 ) ► Span ( # 2 ) 

Coordinate 
Transformation 

Figure 2.2: Coordinate transformation 



Proposition 2.3.2 (Computation of coordinate transformation) The following statements are true. 

1. Let B\ and B2 be two bases for a vector space V of dimension n. Let T denote the coordinate 
transformation from the system B\ into B 2 . Then, T is completely defined by its action on the elements 
ofB h 

2. Let 

Bi = {xi,x 2 ,--- ,x n } and B 2 = {yi, y 2 , ■ ■ ■ , y n } 

be two bases for IR n . Then, the coordinate transformation T from the system B\ into B 2 is the solution 
of: 

T[yi 2/2 ••• y n ] = [xi x 2 ■■■ x n ] 

Moreover, if a is the coordinate of a vector x EV in the system B\, then Ta is the coordinate of x in 
the system B 2 . ■ 

We gave several examples of vector spaces including spaces of vectors, matrices and functions. While 
these spaces are composed of different objects, it is possible to show that under certain conditions they are 
identical for all purposes. This is evident in Proposition 2.3.1 which allows us to transfer an abstract vector 
to an array of scalars. It is also used in Proposition 2.3.2. 

Definition 2.3.3 (Isomorphism) Let V and W be vector spaces over the same field. An isomorphism be- 
tween V and W is a function f :V — > W that is linear, one-to-one and onto. 

When an isomorphism exists between two vector spaces, we say that the vector spaces are isomorphic. ~k 



2.3. COORDINATES AND ISOMORPHISMS 



29 



Theorem 2.3.1 (Isomorphic vector spaces) Let V and W be vector spaces over the same field. The fol- 
lowing statements are equivalent. 

1. V and W are isomorphic 

2. The dimension ofV is equal to the dimension ofW 

Suppose that statement 2 holds. Let {v^} be a basis for V and {w^} be a basis for W. Then the mapping 

Vk i-> Wk for each k 

is an isomorphism between V and W. ■ 

Accordingly, any real vector space of dimension n is isomorphic to IR n . This is very important as it allows us 
to convert problems on abstract vector spaces into problems on IR n and perform calculations on a computer. 
The following example illustrates this. 

Example 2.3.2 (Polynomial manipulations on a computer) Consider the real vector space V n (T) of 
polynomials of degree at most n defined in Example 2.1.6. Recall that the dimension ofV n (T) is n + 1. We 
will show that V n (T) is isomorphic to IR n+1 by constructing an isomorphism called an identification map. 
This map will give a straightforward way to manipulate polynomials on a computer. 

To construct an identification map, proceed as follows: 

1. Choose a basis for V n (T) and a basis for IR n+1 , say the standard bases {1, t, t 2 , ■ ■ -,t n }for V n (T) 
and {ei, e2, e3, • • •, e n +i}/or IR n+1 . Here, is the column vector of size n + 1 whose elements are 
all zeros except for the kth entry which is one. 

2. Pick a polynomial p in V n (T) and express it in the chosen basis: 

n 

P {t) = Y,Pkt k 

k=0 

where {po,pi, ■ ■ -,p n } is the coordinate ofp in the standard basis. 

3. Define the vector in M" +1 : 

n+l 

X P = ^2 Pk-l e k = [PO Pi ■ ■ ■ Pn f 

k=l 

with coordinates {poiPi, ■ ■ m iPn\ in the standard basis on M" +1 . 

4. Define identification map as the map that takes p to x p . This map is shown to be linear, one-to-one 
and onto. 



30 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



That is, we stack up the polynomial coefficients to form a vector. Since a polynomial is completely deter- 
mined by its coefficients, we can uniquely identify a polynomial with its coefficient vector. Thus, to add two 
polynomials on a computer, we simply write a program that accepts coefficient vectors and returns sum of 



2.4 Convex sets and functions 

Convexity is perhaps the most important concept in optimization theory. It enables us to determine global 
properties by studying local properties. For example, local minimum of a convex function is also its global 
minimum. Convexity also guarantees the existence of highly efficient algorithms to numerically solve prob- 
lems. For example, the linear programming problem can be solved efficiently. This section gives a very 
brief introduction to the subject. 

Definition 2.4.1 (Convex set) Let V be a real vector space. A non-empty subset CofV is convex if and 
only if whenever x and y are in C, the vector 

ax + (1 — a)y 

is in C for any a € [0, 1]. ~k 

Given x and y in a real vector space V, the line connecting them is the set of all points of the form ax + 
(1 — a)y for a E [0, 1]. So, a non-empty set C is convex if and only if the line connecting any two points in 
C is contained in C. 

Example 2.4.1 (Subspaces and affine sets are convex) Every subspace of a real vector space is a convex 
set. The affine subsets (see definition in Example 2.1.13) of real vector spaces are convex sets. A 

Example 2.4.2 (A non-convex set) Any finite set other than {0} is not convex. For example, {0, 1} as a 
subset o/IR is not convex. A 

Example 2.4.3 (Ellipses are convex) Consider H 2 and the ellipse in JR 2 : 



Example 2.4.4 (Linear equality and inequality) Let A E M mXn and b E JR m be given. The set: 

{x E IR n : Ax < b} , 



the input vectors. 



A 




where (x c , y c ) is the center of the ellipse and (a, b) define its major and minor axes. 



The shaded region in Figure 2.3 is the ellipse. It is a convex set. 



A 



where < denotes component-wise less than or equal to, when non-empty is a convex set. 



A 



2.4. CONVEX SETS AND FUNCTIONS 



31 




Figure 2.3: An ellipse 



Definition 2.4.2 (Convex combination) Let {x^} be a set of elements in a real vector space V. A convex 
combination of {xk } is an element xofV that can be expressed as: 



% = Y1 akXk ( 2 - 5 ) 

k=l 



for some non-negative real numbers cti, • ■ -,(x m whose sum is 1. ~k 



The main difference between linear and convex combinations is that the scalars are free in the former, while 
in the latter, they are non-negative and sum to 1. Recall that the set of all linear combinations is the span. 
We have a similar concept for the set of all convex combinations. 



Definition 2.4.3 (Convex hull of elements) The convex hull of a set of elements {x^} in a real vector space 
Vis 



Conv ({xh}) = {x € V : x is a convex combination of {x &}} (2.6a) 

{x € V : x = Y^T=i a kXkfar some a± > 0, 02 > 0, • • •, a m > 0 
such that Y^k=i a k = 1 



i.e, the set of all elements in V that are convex combinations of {x^} 



(2.6b) 



Example 2.4.5 Consider the vector space IR and the vectors: 

as shown in Figure 2.4. The shaded region in the figure is the convex hull of{x\, X2, X3}. 
The vectors X\,X2 and xsform the vertices of the triangular region. 



"0" 




"2" 




3" 


0 


, x 2 = 




and X3 = 


1 


2 



A 



32 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



(2,2) 




(0,0) 



Figure 2.4: An example of convex hull 
The shaded region in Figure 2.4 is an example of a polytope. Its definition is given below. 

Definition 2.4.4 (Polytope) Let {xk}™ = i be a finite set of elements in a real vector space V. The convex 
hull of {xk}™ = i is called the polytope generated by {xk}™ = i- ~k 

Example 2.4.6 (A polytope of Gaussian density functions) Consider the real vector space C (IR) of con- 
tinuous functions on IR taking values in IR. For k = 1, 2, • • • , m, define the Gaussian probability density 
functions: 

fk k 
fk(x) =W — exp(--x 2 ) for allx € IR 

The convex hull o/{/fc}fcLi is ci polytope in C (IR). A typical element of this polytope is a function of the 
form: 



■ exp(— — x 2 ) 



k=l 



where the coefficients {a^} are all non-negative and sum to 1. Hence, every element of the polytope is 
a probability density function. In statistics, elements of this polytope are known as mixtures of Gaussian 
density functions. A 

Another important geometric object is obtained by generalizing the definition of an ellipse (see Exam- 
ple 2.4.3) to arbitrary vector spaces. We give the definition for IR n . 



Definition 2.4.5 (Ellipsoid) Let P e IR nXn be a strictly positive matrix and x c € IR n . The set 

£(P, x c ) = {x e IR n : (x - x c f P 1 (x - x c ) < ll 
is called an ellipsoid centered at x c - It is a convex set. ~k 



We shall define positive matrices in the next chapter and the above definition will become clearer. 



2.5. SOME OPTIMIZATION PROBLEMS 



33 



Definition 2.4.6 (Convex function) Let S be a convex subset of a real vector space V. A function f : S — > 
]R is said to be convex if 

f (ax + (1 - a)y) < af(x) + (1 - a)f(y) 
for all a € [0, 1] and x, y in S. ~k 



The definition is shown in Figure 2.5. It says that the function always lies below straight lines that connect 
any two function values. It is important to note that convexity of a function depends on its domain S. In 
particular, S must be a convex set. 

Line connecting 

function values Non-convex function 




x ^ y x y 



Line connecting 
x and y (points in domain) 



Figure 2.5: Function on the left is convex; function on the right is non-convex 



Example 2.4.7 (Linear and affine functions are convex) Consider JR n . Let a be a vector in IR n and b be 

a real number. Define the function f : IR n — > IR as follows: 

f(x) = a T x + b 

f is an affine function and when b — 0 is a linear function. / is convex. 



2.5 Some optimization problems 



Let V be a real vector space and S be a subset of V. Let / : S — > JR be a function. Without being precise, 
the prototypical optimization problem is stated as: 

mmf(x) 



34 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



or as 

mmf(x) 
subject to x € S 

and consists of two parts: 

1. find the minimum value 7 m j n , called the minimum function value, taken by the function / over S, 
and 

2. find a point x m i n in S, called the optimal solution, such that f(x m i n ) = 7 m i n , i.e, the function achieves 
the minimum value at x m i n . 

The notation itself gives the following information. / is the function being minimized, x is the unknown or 
decision or search variable, and S is a feasible set. 

Definition 2.5.1 (Convex optimization problem) Let S be a convex subset of a real vector space and f : 
V — > IR be a convex function. The minimization problem: 

mmf(x) 

xev 

subject to x E S 

is called a convex optimization problem. In words, a convex optimization problem is a minimization problem 
of a convex function over a convex feasible set. ~k 

If S has a finite dimensional characterization, then convex optimization problem can be efficiently solved. 
In fact, an algorithm known as elliopsoidal algorithm can be applied universally. We conclude this chapter 
with some specific examples of convex optimization problems. 

Example 2.5.1 (Linear programming problem) Take V = IR n , 

S = {x : Ax < b} 

and f to be f(x) = c T x. Here, A € IR mXn , b e K m and c € ET are given. The resulting problem: 

min c T x 
xeM n 

subject to Ax < b 



is called the linear programming problem. It can be solved with the ellipsoidal algorithm, simplex method 
etc. A 



2.5. SOME OPTIMIZATION PROBLEMS 



35 



Example 2.5.2 (Quadratic programming problem) Take V = M n , 

S = {x:Ax<b} 

and f to be f(x) = x T Qx + r T x. Here, A G JR mxn , b G IR m , Q G JR nxn and r£E" are given. We 
assume that Q is a strictly positive matrix. The resulting problem: 

min x T Qx + r T x 
xeTR n 

subject to Ax < b 



is called the quadratic programming problem. 



A 



CHAPTER 2. INTRODUCTION TO VECTOR SPACES 



Chapter 3 



Matrix Theory 



Matrices arise as representations of linear operators and systems. This chapter begins with a review of 
different types of matrices, and then describes certain numbers and spaces associated with matrices. These 
numbers and spaces are clear evidence of abstractness of matrices over and beyond their representation 
as rectangular arrays. Section 3.3 is an introduction to computational linear algebra and is particularly 
important. The last section presents some applications to linear equations and optimization. 

3.1 Different kinds of matrices 

We shall refer to matrices in IR mxn as real matrices and matrices in C raXri as complex matrices. The 
transpose and complex conjugate transpose of a matrix A are denoted by A T and A* respectively. If A 
is a real matrix, then A* = A T . Some occasions require matrices with infinite number of rows and/or 
columns and in such cases, we shall explictly say infinite-sized matrix. In all other occasions, matrices will 
be understood to have finite number of rows and columns. 

A matrix A 6 IR nxn with the property A = A T is called symmetric, while a matrix A 6 IR nxn with the 
property A = —A T is called skew-symmetric. The only matrix that is both symmetric and skew-symmetric 
is the zero matrix. A matrix A € fc nXn w ith the property A = A* is called Hermitian, while a matrix 
A € C nxn with the property A = —A* is called skew -Hermitian. Since the complex conjugate of a real 
matrix is itself, symmetric matrices are Hermitian. It can be shown that the diagonal elements of skew- 
symmetric and skew-Hermitian matrices are all equal to zero. 

A matrix A € R, nXn is orthogonal if A 1 A = AA T = a diagonal matrix. It is orthonormal if A T A = 
AA T = I where I denotes the identity matrix of appropriate dimension. A complex matrix is inner if 
A* A = I and co-inner if AA* = I. A complex matrix is unitary if it is inner and co-inner. Inner matrices 
are tall; co-inner matrices are fat; orthonormal matrices (defined only for real matrices) are unitary. 

An upper-triangular complex Jordan matrix of size n with eigenvalue A, denoted by J n (A), is a square 



37 



38 



CHAPTER 3. MATRIX THEORY 



matrix of the form: 



[X 1 0 



0 01 



0 A 1 



0 0 



Jn (A) = 



0 0 0 



A 1 



0 0 0 



0 A 



that is, the elements on the main diagonal are all equal to A, the elements on the diagonal above the main 
diagonal are all equal to 1 and all the remaining elements are zeros. It is an element of C nxn . There 
are lower-triangular and real versions of Jordan matrices, but in these notes, we shall only consider the 
upper-triangular complex version. So, we simply say Jordan matrix of size n with eigenvalue A. 

A matrix A E C nXn is invertible if and only if there exists a matrix B E <D nXn such that AB = I. In this 
case, B is called the inverse of A and is denoted by A -1 . It is easy to see that AB = I if and only if BA = I 
so that there is nothing special about the ordering of A and B we chose while defining an invertible matrix. 
Orthogonal and unitary matrices are invertible. In fact, the inverse of an orthogonal (unitary) matrix A is 
A T (respectively, A*). 

A symmetric matrix A E IR nxn is positive semi-definite (or simply positive), denoted as A > 0, 
if and only if 



for all x E M n . A symmetric matrix A E R, nXn is positive definite (or strictly positive), denoted as A > 0, 
if and only if 



for all x 7^ 0. It is very important to note the difference in the definitions. We require strict positivity in the 
latter definition only for non-zero vectors. A matrix A is (strictly) negative if —A is (strictly) positive. We 
use the notations A < 0 and A < 0 to mean A is negative and strictly negative. Finally, given symmetric 
matrices A and B, we write A > B to mean A — B > 0 (similarly for A > B). 

Note that every positive real number has at most two square roots which differ by —1 (a unitary matrix). 
We can extend this notion of a square root to positive matrices. Let A E (D nXn be a positive matrix. Then, 
there exist a positive integer k and a matrix B E C nxk such that A = BB* (this can be seen from singular 
value decomposition (SVD) discussed later). B is called a square root of A. Unlike positive numbers, most 
matrices have an infinite number of square roots. But, like positive numbers, there is a unique positive 
square root which will be denoted as A 1 1 2 . 

Throughout these notes, we use the word positive to mean greater than or equal to zero, while strictly 
positive means strictly greater than zero. Similarly for negative and strictly negative. 



Proposition 3.1.1 (Some properties of matrices) Let A E IR nXn , B E IR nXn and C E IR nXfc . The 
following statements are true. 



x T Ax > 0 



x 



■ T Ax > 0 



3.1. DIFFERENT KINDS OF MATRICES 



39 



1. Let A and B be symmetric matrices. Then, A + B and C T AC are symmetric matrices. Suppose 
further that A and B commute, that is, AB = BA. Then, AB and BA are symmetric matrices. 

2. Let A and B be orthogonal matrices. Then, AB and BA are orthogonal matrices. 

3. Let A and B be (strictly) positive matrices. Then, A + B and ABA T are (strictly) positive matrices, 
and C T AC is positive. Moreover, A + B > A and A + B > B. 

4. There exists a symmetric matrix A (of size strictly greater than 1) such that A ^ 0, A ^ 0 and A ^.0. 

5. If A is a real (complex) matrix, then AA T (respectively, AA*) is positive. ■ 



These properties show some of the actions that do not destroy structure. For example, statement 1 says 
that addition of symmetric matrices produce only symmetric matrices. This should not be surprising since 
the set of symmetric matrices is a vector space. Multiplication destroys symmetry in general, but preserves 
orthogonal-ness. 

Statement 4 can be a source of confusion. There is a natural way to order real numbers. For example, given 
a real number x, we have either x > 0 or 0 > x or x = 0. Thus, we can always tell which side of zero does 
a given real number lie. Clearly, a real number is a symmetric matrix (of size 1 x 1). Unfortunately, the 
ordering relation on real numbers does not extend to the general case of symmetric matrices. We can only 
define a partial order which cannot always tell which "side" of zero does a symmetric matrix lie. 

Let us introduce the following notation. Denote by e n k the vector of size nxl whose entries are all zeros 
except the kth entry which is 1. For example, 



e32 



It should be clear that {e n i, e n 2, ■ • e nn } is the standard basis for C 



Proposition 3.1.2 (Matrix action) Let A e <D mXn . The following statements are true. 



1. Let the kth column of A be given by 6 C m . Then, 

(that is, A acting on e n k pulls out the kth column). 

2. Let the the element of A be given by a,ij E C. Then, 



3. A = 0 if and only if Ax = Ofor all x € C n . 



40 



CHAPTER 3. MATRIX THEORY 



If we think of a matrix ^4 e C mXn as an array of numbers: 



an ai2 

021 «22 



Oral Om2 



O-ln 
G2n 



then the matrix is completely defined by the numbers {a^}. Statements 1 and 2 show how to extract these 
numbers by applying A to the elements of standard bases for its input and output spaces. Hence, we say that 
a matrix is completely defined by its action on elements of a basis for its input space. Statement 3 says that 
the matrix whose action is to map every vector to the zero vector is the zero matrix. 



3.2 Numbers and spaces associated with matrices 

There are many numbers that can be attached to a matrix. For example, to each square matrix, we can 
attach its determinant. Another number of great importance is an eigenvalue. We can also attach spaces 
with matrices. These numbers and spaces capture properties that are intrinsic to a given matrix. 




Figure 3.1: Input-output representation of a matrix and its complex conjugate transpose 

The input-output representation of a matrix is shown in figure 3.1. It may be useful to write this representa- 
tion as: 

y = Ax 

where x € G n is the input and y e C m is the resulting output. The complex conjugate transpose of 
A e C mxn is a matrix in C nxm . Therefore, we can think of A* as a mapping from € m into € n as shown 
in the figure. The input-output representation of A* may be written as: 

x, = A*y 

It is important to note that these representations are of matrices and their complex conjugate transposes, but 
not of their inverses although at first glance the figure may suggest so. 

In what follows, the definitions and results are stated for complex matrices and remain valid for real matrices. 

Definition 3.2.1 (Eigenvalue-Eigenvector pair) Let A e <D nXn . A pair (X,x), where A G Cand x G C n , 
satisfying 



3.2. NUMBERS AND SPACES ASSOCIATED WITH MATRICES 



41 



1. x 7^ 0 and, 

2. Ax = Xx 

is called an eigenvalue-eigenvector pair of A. X is an eigenvalue of A and x is an eigenvector associated 



By definition, an eigenvector is non-zero. This is an important fact that will be used many times. Also, by 
definition, every eigenvalue comes with an eigenvector. But, the eigenvector associated with an eigenvalue 
is not unique. For example, assume that (A, x) is an eigenvalue-eigenvector pair for the matrix A. Then, 
(A, cx), where c is a non-zero scalar, is also an eigenvalue-eigenvector pair for A (show this by using the 
definition ?). We say that eigenvectors are unique up to multiplication by a non-zero scalar. 

Definition 3.2.2 (Characteristic polynomial and equation) Let A E C nXn . The characteristic polyno- 
mial of A is the polynomial pa '■ C — > C defined as: 

Pa(s) = det (si - A) forallsEC 

It has degree n. The characteristic equation of A is pa(s) = 0. 

An important fact is that the roots of the characteristic polynomial of A are the eigenvalues of A. So, an 
n x n matrix can have at most n eigenvalues (this is a consequence of the fundamental theorem of algebra, 
later we shall give a geometric argument). It is generally difficult to use characteristic equation to compute 
eigenvalues. We must appeal to the definition for ease; in fact, most numerical methods do. 

Example 3.2.1 (Eigenvalues of Hermitian and skew-Hermitian matrices) Let A be Hermitian. The 
eigenvalues of A are real numbers. To see this, pick an eigenvalue-eigenvector pair (A, x) of A. Then, 
i^O, Ax = Xx and: 



We claim that x*x ^ 0. This can be seen by writing x in terms of its components and carrying out the 
multiplication x*x and using the fact that x ^ 0. Therefore, 



with X. 



Xx*x = x* Ax = x* A* x = (Ax)* x 
= (Xx)* x = Xx*x 



where we used the fact that A = A*. The above equation implies: 





implies that 




42 



CHAPTER 3. MATRIX THEORY 



which means that A is a real number. 

Suppose that M is skew-Hermitian. Then, using similar arguments, we can show that the eigenvalues of M 
have zero real parts (purely imaginary). A 



Consider the matrices 



Ai = 



and A2 = 



(3.1) 



which have the same set of eigenvalues, namely 0 repeated twice. A\ has two linearly independent eigen- 
vectors; but A2 has only one eigenvector (up to multiplication by a non-zero scalar). This observation is tied 
to the concept of multiplicity. 



Definition 3.2.3 (Algebraic and geometric multiplicities) Let A be an eigenvalue of A E C nxn . The al- 
gebraic multiplicity ofX, denoted by alg^ (A), is the number of times it repeats as a root of the characteristic 
polynomial of A. The geometric multiplicity ofX, denoted by geo A (A), is the number of linearly independent 
eigenvectors associated with A. 



Definition 3.2.4 (Defective and non-defective matrices) Let A e (D nxn . We say that A is defective 
if and only if the algebraic and geometric multiplicities of some eigenvalue of A are not equal. A matrix 
that is not defective is called non-defective ( that is, for each eigenvalue of A, its algebraic and geometric 
multiplicities are equal). -k 



Recall that an eigenvalue comes with an eigenvector by definition. So, the number of times an eigenvalue 
repeats must be at least as large as the number of linearly independent eigenvectors, but it could be strictly 
larger. In other words, 

alg A (A) > geo A (A) always. 

In the case of a defective matrix, there is at least one eigenvalue for which the above inequality is strict. On 
the other hand, the inequality is an equality for all the eigenvalues of a non-defective matrix. 



Example 3.2.2 The matrices A\ and A2 given in (3.1) have: 

geo Al (0) = 2 and geo A2 (0) = 1 

and 

alg Al (0) = 2 and &\g A2 {0) = 2 

Hence, A\ is non-defective and A2 is defective. Note that A2 is the jordan matrix J2 (0) of size 2 with 
eigenvalue 0. It can be shown that J n (A) with n > 1 is defective. A 



Proposition 3.2.1 (Some non-defective matrices) Hermitian, skew-Hermitian and unitary matrices are 
non-defective. ■ 



3.2. NUMBERS AND SPACES ASSOCIATED WITH MATRICES 



43 



Defective matrices are very difficult to handle computationally. This is because there is a non-defective 
matrix that lies arbitrarily close to a given defective matrix. For example, 



e 1 
0 0 



where e is a small non-zero number, is non-defective and lies close to A2 given in (3.1). Thus, it is virtually 
impossible to reliably determine multiplicities of defective matrices on a digital computer. The matrix-types 
listed in the above proposition are non-defective and, hence, computer-friendly. 



Definition 3.2.5 (Schmidt pairs and singular values) Let A e (D mXn . A singular value a of A is a posi- 
tive number for which there exists a pair (u, v) of non-zero vectors, where u € G m and v E C n , satisfying 

A*u = av and Av = au (3.2) 

The pair (u, v) is called a Schmidt pair associated with a. *k 



Let us make the following observation. Suppose that (u 7 v) is a Schmidt pair of A. Then, by definition, (3.2) 
holds and 

(AA*)u = A(A*u) = A(av) = oAv = a 2 u 

which implies, since u is non-zero, that (a 2 , u) is an eigenvalue-eigenvector pair of AA* . Similarly, we can 
show that (cr 2 , v) is an eigenvalue-eigenvector pair of A* A. Thus, a > 0 is a singular value of A if and only 
if a 2 is an eigenvalue of AA* (and A* A). Note also that Schmidt pairs are eigenvectors of AA* and A* A. 



Example 3.2.3 A matrix whose eigenvalues are all zeros need not be zero. But, a matrix is zero if and only 
if its largest singular value is zero. A 

Definition 3.2.6 (Range space and null space of a matrix) Let A e (D mxn . The range space (or simply 
range) of A, denoted by 1Z (A), is the set of all elements in C m that can be reached from C n by applying A, 
that is, 

TZ (A) = {y E € m : there exists x E C n such that y = Ax} 
= {Ax : x € C n } 

The null space of A, denoted by H (A), is given by: 

M{A) = {xEC n :Ax = 0} 

that is, the set of all elements in C n that are mapped to zero by A. i< 



Range and null spaces are very important. Many engineering problems reduce to computing these spaces 
efficiently. Examples include data compression and separating signal from noise. Note that the definitions 



44 



CHAPTER 3. MATRIX THEORY 




(Input Space) 




m 



(Output Space) 



Figure 3.2: Location of range and null space of A 

involve systems of linear equations, namely Ax = y in the case of range and Ax = 0 in the case of null 
space. So, these spaces are fundamental objects in the theory of linear equations and more generally of 
optimization theory. 

Let us denote the kth column of A by and the A;th element of x € C n by x^. So, 



A = [a\ a 2 



and x = [x\ X2 



and 



Ax = a\X\ + a 2 x 2 + • ■ • + a n x n = x\a\ + x 2 a 2 + ■ ■ ■ + x n a r , 



where the quantity on the extreme right is the weighted sum of the vectors {a^}, in other words, a linear 
combination of the columns of A. Now, range of A is the set of all vectors that can written as Ax as x ranges 
over C n . This leads us to conclude that the range of A is the set of all linear combinations of the columns of 
A, i.e. the span of the columns of A. For this reason, range of a matrix is also known as the column span. 

The definition of null space sports the equation Ax = 0. In the notation, we have just introduced, this 
equation becomes: 

xiai + x 2 a 2 + ■ ■ ■ + x n a n = 0 

Recall that an equation of this form appeared in the definition of linear independence in Chapter 2. In fact, 
if the only solution of the above equation is xi — x 2 — • • • — 0, then the columns of A are linearly 
independent, while a non-zero solution (meaning at least one of rz^'s is non-zero) implies that the columns 
are linearly dependent. Thus, the null space of A tells us if the columns of A are linearly independent or 
not. 



Example 3.2.4 Consider the matrices A\ and A 2 given in (3.1). We have: 



and 



n(A 2 ) 



0 0 
0 0 

o r 

0 0 



x : x e C 



x : x e C 



{0} 



3.2. NUMBERS AND SPACES ASSOCIATED WITH MATRICES 



45 



0 


r 




Xl 


0 


0 




X 2 



X2 

0 



: x\ C C.r-2 ( C 
: x 2 € C > = Span 



C) 



(recall the definition of span from (2.4)). Similar computations yield the null spaces. 



A 



We already mentioned that the range of a matrix is the span of its columns. Since span of a collection of 
vectors is a subspace, range of a matrix is a subspace. Now, consider the null space of a matrix A. Take any 
pair of elements x,y'mN (A). Then, by definition, 

Ax = 0 and Ay = 0 

So, for any pair of scalars a and j3, we have 

A(ax + (3y) = aAx + /3^4y = 0 

which proves that ax + j3y is in the null space. Therefore, the null space of A is also a subspace. 

Theorem 3.2.1 (Range and null space are subspaces) Let A E C mXn . The range of A is a subspace of 
C m and the null space of A is a subspace o/C". ■ 

It is important to note that the range and null space of A are in general subspaces of different spaces (re- 
spectively, output space and input space). See figure 3.2. A great simplification occurs in the case of a 
square matrix A E C nXn . In this case, C" is both the input space and the output space. Although inputs 
and outputs may correspond to quantities that are physically different, by isomorphism, the spaces are really 
two copies of the same space. 

Definition 3.2.7 (Rank and nullity of a matrix) Let A E fc mXn _ j ne raru \ Q j j± [ s tne dimension of the 
range of A. The nullity of A is the dimension of the null space of A. ~k 

If A has n columns, then the span of the columns of A cannot have dimension greater than n. So, the rank 
of a matrix is less than or equal to the number of columns. Rank is also less than or equal to the number of 
rows, but we have to wait a while to see it. 

Proposition 3.2.2 Let A € <D mXn and B € C nXk . The range of AB is contained in the range of A. The 
null space of AB contains the null space of B. ■ 

We conclude this section with another number associated with a square matrix. 

Definition 3.2.8 (Trace of a square matrix) Let A e <D nXn . The trace of A, denoted by Tr (A), is the sum 
of the diagonal elements of A. i< 



46 



CHAPTER 3. MATRIX THEORY 



Trace and determinants are related to the other numbers introduced earlier. 

Proposition 3.2.3 (Properties of determinant and trace) Let A E € nXn and B E C nXn . The following 
statements are true. 

1. The determinant of A is the product of the eigenvalues of A. The trace of A is the sum of the eigen- 
values of A. 

2. The determinant of AB is equal to the product of the determinants of A and B. 



3.3 Decompositions 

Decompositions are used to reveal properties of matrices. An easy decomposition that follows from the 
vector space structure of the set of real square matrices is the additive decomposition of a matrix into a 
symmetric part and a skew-symmetric part. That is, A E R, nXn can be written as: 



We shall examine decompositions that are multiplicative, that is, those that express a given matrix as the 
product of other matrices. The most important such decompositions are QR, Jordan (or spectral), singular 
value, and Schur. The Jordan decomposition for non-defective matrices becomes the eigenvalue-eigenvector 
decomposition. QR and Schur decompositions are the easiest to compute (on a digital computer) followed 
by singular value decomposition (SVD). Jordan decompositions are next to impossible to compute, but are 
very useful in proving theorems. 

3.3.1 QR Factorization 

Consider a system of linear equations of the form Ax = b where x is the unknown variable. In many 
situations, A has more rows than columns meaning that there are more equations to be satisfied than the 
number of unknowns. QR factorization is useful in solving the system of equations in such cases. 

Theorem 3.3.1 (QR factorization) Let A E <D mXn with m > n. There exist a positive integer r < n, an 
unitary matrix Q E C mxm > a permutation matrix II E fc nXn anc { a ma trix R E C mXn with the structure 




A = X + Y 



where X is symmetric and Y is skew-symmetric. In fact, 




R = 

_0(m— r)xr ®(m—r)x(n—r) 
where Ru is upper-triangular and has rank r, such that AJX. = QR. 



3.3. DECOMPOSITIONS 



47 



The matrices Q and R are called the QR factors of A, and r is the rank of A. There are many efficient 
procedures for computing QR [6]. The most important uses of QR factorization are in computing the range 
and null spaces of matrices and in solving linear equations. 

Theorem 3.3.2 (Properties of QR) Let A E C mxn with m > n and (Q, R) be the QR factors of A as in 
Theorem 3.3.1. Let r be the rank of A and partition Q as: 

Q = [Qi Q2] 

where Q\ consists of the first r columns of Q and Q2 consists of the remaining m — r columns of Q. The 
following statements are true. 

1. A basis for the range of A is given by the columns of Q\. That is, 1Z (A) — 1Z (Qi) 

2. A basis for the null space of A* is given by the columns of Qz- That is, H (A*) = 1Z (Q2) 

3. A = Qi [Ru R12 ] where Rn and Ru are as in Theorem 3.3.1. ■ 

The basis given by QR consists of vectors of unit length and mutually orthogonal (we have not yet defined 
orthogonality). This follows from the fact that Q is unitary. 



3.3.2 Singular Value Decomposition (SVD) 

Schmidt pairs lead to the following decomposition known as Singular Value Decomposition (SVD). It is 
perhaps the most important decomposition and is frequently used in data compression, document searching 
etc. 



Theorem 3.3.3 (Singular Value Decomposition (SVD)) Let A e <D mXn . There exist a positive integer 
r < min{m, n} and unitary matrices U E <D mxm and V E C nXn such that 



A = U 

where Si is an r X r diagonal matrix: 



Si Orx(n-r) 
0(m— r)xr ®(m—r)x(n—r) 



v* 



(3.3) 





0 


0 


. . . 0 


0 


o~2 


0 


■ ■ ■ 0 


0 


0 


0-3 


■ ■ ■ 0 








■ ■ ■ 0 








■ ■ ■ 0 


0 


0 


0 




> • 




> a r 


> 0 (that is, 



(3.4) 



ordered from the largest to the smallest). 



48 



CHAPTER 3. MATRIX THEORY 



Gk is called the fcfh non-zero singular value of A. Singular values were defined earlier and are the positive 
square roots of the eigenvalues of AA* . By our convention, <j\ denotes the largest singular value and a r is 
the smallest non-zero (when one exists) singular value. The total number of strictly positive singular values 
is r and it is equal to the rank of A. 

Theorem 3.3.4 (Properties of SVD) Let A e <D mXn and consider its SVD given in Theorem 3.3.3. Parti- 
tion U and V as: 

U = [U! U 2 ] and V = [Vx V 2 ] 

where U\ and V\ consist of the first r columns of U and V respectively, and U 2 and V 2 consist of the 
remaining m — r columns ofU and the remaining n — r columns ofV respectively. The following statements 
are true. 

1. A basis for the range of A is given by the columns ofU\. That is, 1Z {A) — 1Z (Ui) 

2. A basis for the null space of A is given by the columns ofV 2 . That is, J\f (A) = 1Z (V 2 ) 

3. A basis for the range of A* is given by the columns ofV\. That is, 71 (A*) = 1Z (Vi) 

4. A basis for the null space of A* is given by the columns ofU 2 . That is, H (A*) — 1Z (U 2 ) 

5. The columns ofU and V are eigenvectors of AA* and A* A respectively. That is, 



AA*U=U 



0 



rx(m-r) 



0(m— r)xr ®(m—r)x(m—r) 



and 



A*AV = V 
6. The matrix A can be written as: 

where Si is as in Theorem 3.3.1. 



S? 



Orx(n-r) 



0(n— r)xr ®(n—r)x(n—r)_ 



A = t/iSiV - ! 



3.3.3 Spectral Decomposition 

The spectrum of a matrix is the set of its eigenvalues. Spectral decomposition is a decomposition that 
explicitly shows the eigenvalues and their multiplicity structures. 



Theorem 3.3.5 (Complex Jordan form) Let A e <D nXn . There exists an invertible matrix M such that 

A = MJM- 1 



3.3. DECOMPOSITIONS 



49 



where 



J ni (Ai) 0 

0 Jn 2 (A 2 ) 



0 
0 



0 0 ••• J nm (Xm). 

and J nk (Ajt) is the nk X nk upper-triangular Jordan matrix with eigenvalue A&. 



The block-diagonal matrix J is called the Jordan form of A. It is a block-upper-triangular matrix. So, the 
Afc's are eigenvalues of A. Every matrix can be put in its Jordan form. To put a matrix in its Jordan form, we 
compute a specific similarity transform and apply the transformation. This may be a difficult task because, 
in general, matrices can be defective. But, the Jordan forms of non-defective matrices can be computed 
more reliably and turn out to be diagonal. It is often called the eigenvalue-eigenvector decomposition. 

Theorem 3.3.6 (Eigenvalue-eigenvector decomposition) Let A E € nxn be non-defective. There exists an 
invertible matrix E such that 

A = EKE- 1 

where 

A = diag [Ai,A 2 ,- • -,A„] 

and A*; 's are eigenvalues of A. ■ 

We can write the eigenvalue-eigenvector decomposition in the more revealing form: 

AE = EA 

Now, let Vk denote the feth column of E: 

E = [vi v 2 ■■■ v n ] 

and carry out the matrix multiplications using the diagonal structure of A to obtain: 

AE = [Avi Av 2 ■■■ Av n ] and EA — [A\vi X 2 v 2 ■■■ Kv n ] 

so that the eigenvalue-eigenvector decomposition is really 

Avi = Xivi, Av 2 = X 2 v 2 , • • •, Av n = X n v n 

This shows that the columns of E are the eigenvectors associated with the eigenvalues of A. Hence, the 
terminology eigenvalue-eigenvector decomposition. Similar manipulations when carried out with the Jordan 
decomposition lead to the notion of a generalized eigenvector. 

Example 3.3.1 (Hermitian matrices) Let A e C nXn be Hermitian. Then, the eigenvalue-eigenvector de- 
composition of A has the form: 

A = EAE* 

where the diagonal elements of A are real numbers. A 



50 



CHAPTER 3. MATRIX THEORY 



3.3.4 Similarity 

Note that in the Jordan decomposition (or eigenvalue-eigenvactor decomposition), we begin with a matrix A 
and end up with a matrix J (respectively A) that reveals some intrinsic structure of A. In fact, this procedure 
is a special case of what is known as similarity transformation. 

Definition 3.3.1 (Similarity transformation, similar matrices) Let M € €> nXn be an invertible matrix. 
The map that takes a matrix A 6 <D nXn into MAM^ 1 is called a similarity transformation. 

Two matrices A £ C" Xn and A 6 C nXn are said to be similar if and only if there is a similarity transfor- 
mation M such that A = MAM^ 1 . 

Example 3.3.2 (Eigenvalue-Eigenvector decomposition) Let 

A = EKE- 1 

be the eigenvalue-eigenvector decomposition of A. Then, A and A are similar, and E is the similarity 
transform. A 

Proposition 3.3.1 (Eigen-structures are similarity-invariant) Suppose that A e <C nXn and A e C nXn 
are similar. Then, the eigenvalues, their geometric and algebraic multiplicities of A and A are the same. ■ 

Example 3.3.3 (Hermitian matrices) Let A e C nXn and A E C nXn be Hermitian matrices. Then, A and 
A are similar if and only if their eigenvalues are equal counting multiplicities. To see this, suppose that the 
eigenvalues are same. Apply the eigenvalue-eigenvector decomposition to A and A: 



where the diagonal matrix of eigenvalues are the same, but the similarity transforms E and E could be 
different. Solving for A from the first equation, we get 



A = EKE' 1 and A = EKE' 1 



A = E~ X AE 



Substituting for A into the second equation gives 




Now, define 



M = EE 



l 



and note that it is invertible. So, A and A are similar. The reverse implication is easy and follows from 
Proposition 3.3.1. A 



The matrices A\ and A2 given in (3.1) have same eigenvalues, but different Jordan structure. So, A\ and A2 
are not similar. 



3.4. LINEAR EQUATIONS AND OPTIMIZATION PROBLEMS 



51 



3.4 Linear equations and optimization problems 

We present a complete analysis of system of linear equations in this section. Although the presentation 
considers equations involving matrices, it can be easily extended to the more general case of linear operators. 
Consider the following linear system of equations: 

Ax = b (3.5) 

where A € C mXn . We shall ask four questions: 

1. Does (3.5) have a solution for a given b ? 

2. Does (3.5) have a solution for any b ? 

3. If a solution exists for a specific b, then is it unique ? 

4. If a solution exists for a specific b, then what is a characterization of all solutions (meaning find all 
solutions) ? 

The answer to the first question is the following. A solution exists if and only if b is in the range of A. 
Recall that the range of A is the set of all points in the output space that can be reached from the input space 
through the application of A. So, in order to have a solution (put it differently, in order to reach b from the 
input space), b must be in the range of A. This clearly indicates the answer to the second question which 
asks if we can reach every point in the output space. A solution exists for any 6 £ C m if and only if the 
range of A = C m , i.e. range of A is equal to the entire output space. The third question can be answered 
using linearity. All solutions can be written as the sum of a particular solution and elements of the null space 
of A. This is because points in the null space of A "contribute nothing towards b". Note that if the null space 
contains non-zero elements, then the solution is not unique. So, when a solution exists, it is unique if and 
only if the null space is {0}. We summarize these in the following theorem: 

Theorem 3.4.1 (Linear equations - existence and uniqueness) Let A E (D mxn . The following statements 
are true. 

1. Let b E C m be given. The system (3.5) is solvable if and only ifb E 1Z (A). 

2. The system (3.5) is solvable for any b E € m if and only if 11 (A) = € m . 

3. Let b E 1Z (A). The system (3.5) has a unique solution if and only if M (A) — {0}. If the null space 
is not equal to {0}, then all solutions of (3.5) are of the form: 

x = x p + v 



where v is in H (A) and x p E C n , called the particular solution, satisfies Ax p = b. 



52 



CHAPTER 3. MATRIX THEORY 



This theorem applies to a variety of situations including linear operators. We had shown earlier how to 
compute a basis for the range and null space of a matrix using SVD (Theorem 3.3.4). So, the tests stated in 
the theorem can be conducted very efficiently. Our next objective is to state a computational version of the 
theorem. 

Definition 3.4.1 (Pseudo inverse and Moore-Penrose inverse) Let A € C mXn . A pseudo-inverse of A is 
a matrix B £ C nxm satisfying 

ABA = A and BAB = B (3.6) 

A pseudo-inverse B satisfying 

(AB)* = AB and (BA)* = BA (3.7) 
is called the Moore-Penrose inverse of A and is denoted by A + . "k 

The Moore-Penrose inverse of a matrix always exists and is unique; but pseudo-inverse is not necessarily 
unique. When the matrix A is square and invertible, the pseudo- and Moore-Penrose inverses reduce to the 
inverse A -1 . This is easily seen by taking B = A~ x and verifying that the conditions in (3.6-3.7) hold. 

Proposition 3.4.1 (Formula for Moore-Penrose inverse) Let the SVD of A € € mXn be as in Theo- 
rem 3.3.3. Partition the unitary matrices U and V as in Theorem 3.3.4. The Moore-Penrose inverse of 
A is given by: 

A + = V^Ul 

where Si, appearing in (3.4), is the r X r diagonal matrix whose diagonal entries are the singular values 
of A. m 

We are now ready to summarize the existence theorem 3.4.1 into a computational scheme for solving the 
linear system of equations. 

Theorem 3.4.2 (Solver for linear equations) Let A e C mxn and b e C m be given. The following state- 
ments are equivalent. 

1. There exists x 6 C n such that Ax = b. 

2. AA + b = b 

Moreover, if statement 2 holds, then all solutions are of the form: 

x = A + b+ (I - A + A)v (3.8a) 
= A + b+V 2 V 2 *v (3.8b) 
= A + b+V 2 z (3.8c) 

where v and z are arbitrary vectors and A + is the Moore-Penrose inverse of A. Further, the solution is 
unique if A + A = I. ■ 



3.4. LINEAR EQUATIONS AND OPTIMIZATION PROBLEMS 



53 



Thus, to solve a linear system, we check if statement 2 holds. If so, then there is at least one solution 
and all solutions are given by the formula (3.8). On the other hand, if statement 2 does not hold, then no 
solution exists. As given in the theorem, an SVD of A is needed to compute the Moore-Penrose inverse and 
check statement 2. But, using the formula for A + given in Proposition 3.4.1 and the formula for A given in 
statement 7 of Theorem 3.3.4, we can simplify the existence test to 

UiU*b=b 

which does not involve computing the Moore-Penrose inverse explicitly. Similarly, the formula for all 
solutions (3.8) can also be simplified as: 

x = ViE^UZb + VzVJv 

which involves fewer computations. When A is an arbitrary matrix, this is perhaps the best way (in terms of 
computations and numerical stability) to solve a linear system. If A has some structure such as Hermitian, 
then faster methods are available [6]. The solution forms given by (3.8a) and (3.8b) represent each solution 
as an orthogonal sum and is useful in solving certain optimization problems. The form given by (3.8c) is 
also useful in that it parameterizes the set solutions with the minimum number of free variables. 

Every matrix has a Moore-Penrose inverse. So, we can perform the calculations indicated in the formula 
(3.8) for all solutions irrespective of whether the linear system has a solution or not. When the linear system 
has no solution, what is the meaning of x given by the formula (3.8) ? On the other hand, if the linear system 
has many solutions, which solution should we choose ? 




Figure 3.3: Minimization problem 

To answer the above questions, let us approach the formula (3.8) from another point of view. Pick any 
x eC n . Then, the quantity 

e(x) = b — Ax 

is the error commited by choosing a; as a candidate for solution. The quantity 

e(x)*e(x) = (b-Ax)* (b - Ax) 

is the square error associated with choosing x as the solution. Typically, we would like to find an x that 
minimizes this error. The situation is shown in Figure 3.3 and can be stated formally as: 

min (b - Ax)* (b - Ax) (3.9) 



54 



CHAPTER 3. MATRIX THEORY 



The square error function can be expanded as follows: 

(6 - Ax)* (6 - Ax) = b*b + x*(A*A)x - (2b*A)x 

where 6*6 is a constant (independent of the unknown variable x). Therefore, the minimization problem is 
equivalent to: 

min x*Qx + r*x (3.10) 
xeC n 

with Q = A* A and r = —2A*b. This is an example of an unconstrained convex quadratic optimization 
problem given in Chapter 2. Now, if there is a solution to the linear system Ax = 6, then the optimal solution 
of the minimization problem will also solve Ax = b. 

Suppose that there is a solution to the linear system Ax = 6. Then, the formula (3.8) gives all the solutions. 
In this case, we might ask for the solution of smallest size. That is, 

min x*x (3.11) 

xe€ n 

subject to Ax = 6 

which is an example of an equality-constrained convex quadratic optimization problem (with Q = I and 
r = 0). The motivation for this problem comes from the fact that, in engineering, the decision variable x 
corresponds to physical variables which should be kept small. 

Finally, we could ask for the smallest sized x that minimizes the error 6 — Ax. Here, the size of x is given 
by x*x. The solutions to all these problems are stated below. 

Theorem 3.4.3 (Solution of optimization problems) Let A E (D mXn and 6 E C m be given. The following 
statements are true. 

1. Any x of the form (3.8) is a solution of the unconstrained optimization problem (3.9). The optimal 
cost is given by 

6* (I - AA + ) 6 

2. The constrained optimization problem (3.11) has a solution if and only if AA + b = 6. If it has a 
solution, then the optimal solution is unique and it is given by x = A + b. 

3. x = A + b is the smallest-sized x that minimizes the square error (6 — Ax) * (6 — Ax). ■ 

3.5 A collection of important results 

We conclude this chapter with a number of results that are frequently used. 

Theorem 3.5.1 (Properties of range and null space) Let A E C mxn . The following statement are true. 



3.5. A COLLECTION OF IMPORTANT RESULTS 



55 



1. The direct sum of the range of A and the null space of A* is C m , i.e, 

K{A)®N{A*) = C m 
Moreover, the sum of the dimensions ofTZ (A) and H (A*) is equal to m. 

2. The direct sum of the range of A* and the null space of A is C n , i.e, 

K{A*)®N{A) = C n 
Moreover, the sum of the dimensions ofTZ (A*) and N (A) is equal to n. 



This theorem is an example of how a matrix may split a vector space into subspaces. Statement 1 says 
that the vector space C m may be thought of as the direct sum of two subspaces generated by a matrix A. 
Later on, we shall see how to split a vector space into a direct sum of subspaces with additional features. 
An arbitrary matrix A is used in the theorem and, hence, the statements contain A and A*. When A is 
Hermitian, the statements simplify to a single statement: 

Tl{A) ®X{A) = € n 

Thus, the range space and null space of an Hermitian matrix are orthogonal complements of each other. 

The theorem has a geometric flavor and has many consequences. For example, let r be the rank of A € 
C mXn . Statement 1 and the definition of rank give: 

r + dim Af (A*) = m ^ r < m 

which shows that the rank of a matrix is less than or equal to the number of rows. We had mentioned earlier 
that, as a consequence of the interpretation of range as the column span, rank is less than or equal to the 
number of columns. 

Proposition 3.5.1 (Some consequences of Theorem 3.5.1) Let A e © mXn and B 6 C nXk be of ranks ta 
andrs respectively. The following statements are true. 

1. ta < min{m, n) 

2. ta is equal to the rank of A* 

3. ta is equal to the total number of linearly independent rows (or columns) 

4. ta is equal to the rank of A A* 



5. Rank of AB is less than or equal to the minimum of r a and r b ■ 



56 



CHAPTER 3. MATRIX THEORY 



We already mentioned the simplification that occurs when A is Hermitian. Even in the general case of square 
matrices, range space and null space have a curious property. Let A be a square matrix and x E TZ {A). Then, 
Ax is also in the TZ {A) because after all the definition of range of a A is the set of all vectors of the form 
Ax. This property called invariance under A leads to the theory of invariant subspaces. 

Definition 3.5.1 (Invariant subspace) Let A E C nXn . A subspace S of€ n with the property AS C S, 
that is, 

Ax E S whenever x E S 

is called an A-invariant subspace. ~k 

Example 3.5.1 (Eigen-subspace) Let A E <D nXn and A be an eigenvalue of A. The eigen-subspace of A 
associated with the eigenvalue X: 

E\ = Span {linearly independent eigenvectors associated with A) 
= {x : (XI - A) x = 0} 

is an A-invariant subspace. 

Suppose that {Aj}^ =1 are eigenvalues of A with the eigensub spaces E\ v Then, the direct sum: 

i=l 

is also an A-invariant subspace. A 



